How can we measure machine translation quality?

نویسنده

  • Christian Federmann
چکیده

In this opinion paper, we describe our research work on machine translation evaluation approaches that include mechanisms for human feedback and are designed to allow partial adaptation of the translation models which are being evaluated. While there exists a plethora of different automatic evaluation metrics for machine translation, their output in terms of scores, distances, etc. quite often is neither transparent to translators nor shows good correlation with manual evaluation by human experts. Even worse, machine translation tuning efforts based on these automatic metrics to a certain extent move the research focus into a wrong direction; shifting it from « good » translations to those with a « higher scores ». This further widens the gap between machine translation research and translation producers or users. We first describe several automatic metrics which are being used in current machine translation research. Afterwards we provide a brief overview on manual evaluation techniques which are used in our machine translation group. As minimum error rate training for tuning of (statistical) machine translation system is an important part of the workflow, we think that a (semi-) automatic implementation of such evaluation tasks would be a helpful extension of current state-of-the-art machine translation systems. We conclude by describing the need to shift from automated metrics to consumer-oriented, semi-automatic evaluation as this seems to be highly important to allow more advanced MT techniques to see wider acceptance and usage in real life applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تخمین اطمینان خروجی ترجمه ماشینی با استفاده از ویژگی های جدید ساختاری و محتوایی

Despite machine translation (MT) wide suc-cess over last years, this technology is still not able to exactly translate text so that except for some language pairs in certain domains, post editing its output may take longer time than human translation. Nevertheless by having an estimation of the output quality, users can manage imperfection of this tech-nology. It means we need to estimate the c...

متن کامل

Exploring Prediction Uncertainty in Machine Translation Quality Estimation

Machine Translation Quality Estimation is a notoriously difficult task, which lessens its usefulness in real-world translation environments. Such scenarios can be improved if quality predictions are accompanied by a measure of uncertainty. However, models in this task are traditionally evaluated only in terms of point estimate metrics, which do not take prediction uncertainty into account. We i...

متن کامل

Achieving Human Parity on Automatic Chinese to English News Translation

Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure hum...

متن کامل

Findings of the 2012 Workshop on Statistical Machine Translation

This paper presents the results of the WMT12 shared tasks, which included a translation task, a task for machine translation evaluation metrics, and a task for run-time estimation of machine translation quality. We conducted a large-scale manual evaluation of 103 machine translation systems submitted by 34 teams. We used the ranking of these systems to measure how strongly automatic metrics cor...

متن کامل

How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs

Analysing translation quality in regards to specific linguistic phenomena has historically been difficult and time-consuming. Neural machine translation has the attractive property that it can produce scores for arbitrary translations, and we propose a novel method to assess how well NMT systems model specific linguistic phenomena such as agreement over long distances, the production of novel w...

متن کامل

Findings of the 2011 Workshop on Statistical Machine Translation

This paper presents the results of the WMT11 shared tasks, which included a translation task, a system combination task, and a task for machine translation evaluation metrics. We conducted a large-scale manual evaluation of 148 machine translation systems and 41 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgme...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012